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The Structural Ana lys i s o f Programming Languages . 



B. J. MacLennan 



1 . Introduct ion 

It is common to find articles in the programming language 
literature riddled with unsupported claims. Words and phrases, 
such as 'better', 'simpler', 'more structured' and 'less error 
prone', are used with abandon. If we were selling aspirin and 
made such unsupported claims, we would probably be sued. We 
clearly need more precise ways of measuring our languages. 

A language's structures are some of its most important 
characteristics. These include the data structures: those 
mechanisms that the language provides for organizing elementary 
data values. They also include the control structures, which 
organize the control flow. Less obviously, they include the name 
structures, which partition and organize the name space. 

Languages can be compared relative to their structures in 
the data, control and name domains {and others, such as the syn- 
tactic domain). To make this comparison precise, we need a pre- 
cise method of describing the structural properties of a 
language. Further, this method should be syntax independent; it 
should "look through" the syntax of a language to its underlying 
structure. In the next section we discuss a means by which pro- 
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gramming language structures can be described. 

2 . Describing Structure 

The number of different structures that a programmer can use 
are essentially unlimited. For instance, there are an infinite 
number of ways he can organize his data or control flow. Since 
programming languages are finite, there must be some finite means 
of generating this infinite number of structures. 

The means, of course, is to have some number of primitive 
structures and some number of constructor functions which take 
existing structures and compose them into new structures. For 
instance, Pascal data types are built by applying the data type 
constructors (array, record, set, etc.) to the primitive data 
types (real, integer, char, etc.). This results in hierarchical 
structures. Similarly, control flows may be organized by apply- 
ing the control flow constructors ('sequence', 'if,' and 'while') 
to the control flow primitives (those constructs that do not 
alter the control flow). 

The hierarchical application of constructors to primitives 
is the most common method of building structures. Thus, we can 
use this as a starting point for our analysis of structures. For 
instance, as a first approximation, we can compare the complexity 
of structures of two programming languages by comparing the 
number of primitives and constructors in each. For instance, we 
can see from Table 1 that Pascal has 5 primitive data types and 7 
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data type constructors. 



TABLE 1. Data Structures 



Pascal 




5 primitives: 


real, integer, Boolean, char, text 


7 constructors: 


subrange, enumeration, set, array, file, 




pointer, record. 


Algol - 80 
3 primitives: 


real, integer. Boolean. 


1 constructor: 


array . 


Lisp 1.5 
1 primitive: 


atom 


1 constructor: 


list 


Algol - 68 
11 primitives: 


int, real, bool, char, format, compl , bits, 




bytes, string, sema, file 


6 constructors: 


long, ref, array, struct, union, proc 



Since Algol-60 has 3 primitives and 1 constructor, it is probably 
simpler than Pascal. Conversely, since Algol-68 has 11 primi- 
tives and 8 constructors it is likely to be more complex. How- 
ever, the number of primitives and constructors is not the entire 
story. 

A significant aspect of the structuring mechanisms provided 



by a language is the complexity of the inter-relationships among 
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the primitives and constructors. For instance, if the output of 
every constructor is a legitimate input to every constructor, and 
every primitive is a legitimate input to every constructor, then 
the system will be more regular than if this is not the case. 
This is often called ’orthogonality’. It is also part of what is 
involved when we call a language 'structured'. In the next sec- 
tion we will develop means for analyzing these relationships. 

3 . Data Structures 
3 . 1 Semantic Grammars 

We will begin with data structures to illustrate our tech- 
nique for analyzing structure. Our goal is to analyze the 
interrelationships among the primitives and constructors of a 
system of data structures. How are we to go about this? We can 
begin by looking at syntax because, in most languages, there is a 
close relation between the syntax and the structures it embodies 
(i.e., form follows function). In particular, there will usually 
be exactly one syntactic construct for each data primitive. Con- 
sider Pascal. We can see from Table 1 that the primitives are 
denoted by the predefined type identifiers, 'integer', 'Boolean', 
'real', 'char' and 'text'. There are constructors for enumera- 
tions, subranges, sets, arrays, records, files and pointers. We 
know that these are constructors because each can generate a 
potentially unlimited number of structures (types). Since the 
Pascal grammar tells us what syntac t i c entities can go together 



this will be a big help in deciding what semantic entities can go 
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toge the r . 

Consider the array type. We can write its syntax as 

array-type ::= array [ index-type ] of type 

The index-type must be a type isomorphic to a subrange of the 
integers. Syn tac t i ca 1 ly , this can take the form: 

index-type: scalar-type I subrange-type ! type-identifier 

scalar-type: ( identifier ,...) 

subrange-type: constant .. constant 

What we are interested in, however, is the semant ics of the array 
constructor. Since we know that the index type must be iso- 
morphic to a subrange of the integers, we know that the type- 
identifier must either name a scalar-type or a subrange-type or 
one of the predefined finite d i screte - types , Boolean and char. 
Also, a subrange must be constructed from a discrete constant 
(i.e., an integer, or an element of a scalar or finite discrete 
type). We can write this as a "semant ics -o r ien ted grammar”: 

array-type: array [ index-type , ... 1 of type 

index-type: scalar-type ! subrange-type | discrete-type 

scalar-type: ( identifier ,...) 

subrange-type: constant .. constant 

d i sc rete-type : Boolean I char 

One further simplification can be made here. Recall that in 3 as- 
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array [ i, j ] of t 
is just an abbreviation for 

array [i] of array [j] of t 

Thus, without loss of generality, the definition of array-type 
can be written 

array-type: array [ index-type ] of type 

We have not altered the syntax; we have just eliminated some syn- 
tactic sugar. The semantics of most of the rest of Pascal's con- 
structors closely follows their syntax. 

If we are to be able to compare structures in different 
languages, we must obviously ignore any syntactic differences 
that exist between them. This we can do by writing the grammar 
in a neutral, functional form. For instance, for arrays: 

array-type: array (index-type, type) 

index-type: scalar-type | subrange-type | discrete-type 

scalar-type: scalar ( identifier + ) 

subrange-type: subrange (constant, constant) 

discrete-type: Boolean | char 

3 . 2 Interpretation 

Now, let us make some observations about these rules. Con- 
sider a typical string generated by this grammar: 
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array (char, array (Boolean, real )) 

This string describes a particular Pascal data type. Now suppose 
BOOLEAN = { true, false } is the set of all Boolean values and 

REAL is the set of all real values. Then, the set of all arrays 
with Boolean indices and real elements in just the set of func- 
tions mapping BOOLEAN into REAL: ( BOOLEAN -> REAL ]. There- 

fore, we can see that the string shown above describes the set of 
data values: 

( CHAR -> ( BOOLEAN -> REAL ] ] 

This suggests that we can define an interpretation function , 
I, that associates a set of data values with each string gen- 
erated by the grammar. This can be defined recursively: 

I ( array (t, f) ] = [ I(t] -> I(t'] ] 

I ( scalar (i JL ,...,i n ) J = { i 1 ,...,i n } 

I ( subrange (C, C’) ] = { x I C£x & x<C ' } 

I [ Boolean ] = 300LEAN 

I ( char ] = CHAR 
I [ real ] = REAL 

To make this interpretation more obvious, we will write subrange 
(C, C ' ) as C..C’, and sea la r ( i ^ , • • • , i n ) as { i ^ , . . . , i n }. Fig- 
ure 1 shows the complete Pascal type system using these conven- 

t ions . 

Defining the interpretation for record-type and pointer-type 
is quite complicated without the notations of a relational 
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type: simple-type I structured-type ! pointer-type 

simple-type: index-type | integer I real 

index-type: scalar-type | subrange-type | discrete-type 

scalar-type: { identifier + } 

subrange-type: constant .. constant 

discrete-type: Boolean | char 

structured-type: [packed] unpacked-structured-type 

unpacked-structured-type : array-type | record-type | set-type | 

f i le-type 

array-type: array (index-type, type) 

record-type: record ([field*]) [va r i ant -pa r t ] 

field: field (identifier, type) 

variant -part: field (identifier, index-type) 

X (constant X record-type) * 
set-type: set (index-type) 

file-type: file (type) 

po in ter- type : pointer (type) 

Figure 1. The Pascal Type System 



calculus, so they will not be shown here. The interpretation of 
set and file types are easy to define: 



I [ set (t) ] = P ( I [ t ] ) 

I [ file (t) ] = I [ t ] * 



where P is the power-set function. 

It should be noted that the above equations imply structural 
equivalence of Pascal types, as opposed to name equivalence . The 
Revised Report on Pascal [4] does not define the form of type 
equivalence used. It is simple to alter the above definitions to 
accommodate name equivalence; we just represent each type by a 
pair where the first element of the pair is the type's identifier 
and the second element of the pair is the type in the structural 
sense. Thus we have. 



type: identifier X unnamed-type 

unnamed-type: simple-type | structured-type I pointer-type 
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It should be pointed out that there are limitations to the 
descriptive power of this notation. For instance, it does not 
express the fact that the identifiers in scalar-types must be 
distinct, or that type identifiers must be distinct, etc. To 
include all this information would clutter the notation to the 
point of unusability. 

4 . Structure Diagrams 

We have said that the complexity of a collection of struc- 
tures is reflected by the complexity of the semantic grammar. It 
is still a little difficult to see this complexity in the tradi- 
tional BNF form. For this purpose we have found a diagrammatic 
form enlightening. This is really a dependency graph (showing 
which nonterminals depend on which others) coupled with special 
symbols for various operations, viz . 

A* 

A + 

A X B 

A I B I C 
f A | B] 

where [ A 1 B ] means either A or S or nothing. 




In our semantic grammars (as in syntactic grammars) common 
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structural patterns are factored out and given names. This 
reflects the fact that these structural patterns only have to he 
learned once. In the structure diagrams this factoring is 
represented by an edge that forks and goes to each of the uses of 
that structure. For example, since 'index-type' is used both as 
a part of 'discrete-type' and as a part of array and set types, 
the edge from index type goes to the subgraphs defining each of 
these structures. We have adopted the convention of only using 
binary forks; since edges represent dependencies, this simplifies 
complexity estimation by edge counting. 

Structures from other systems are represented by T-shaped 
terminations. Given this explanation, the reader is encouraged 
to compare the diagram of Pascal's data structures in Figure 2 
with the semantic grammar in Figure 1. The data structures of 
LISP, Algol-60, and Algol-63 are diagrammed in Figures 3-5. 
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Figure 5- The Algol-63 Type System 
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5 . Name Structures 

Next, we will demonstrate the application of these tech- 
niques to the name structures, another subsystem of programming 
languages. The name structures of programming languages are 
often described by terms such as "block-structured", "monol- 
ithic", "disjoint", etc. To get a better grasp on these struc- 
turing techniques we must ask, "What is being structured?" To 
put it more precisely, "What relation or relations are being con- 
trolled by the structuring mechanisms in question?" 

For name structures this relation is visibility , that is, 
the relation that holds between a binding and a use of an iden- 
tifier when that use can refer to that binding. Thus, the primi- 
tives from which names structures are assembled are bindings and 
uses of identifiers, and the constructors used to assemble these 
structures are mechanisms such as block structure. 

How can we abstract the name structures from a programming 
language? Again, we can use syntax as a guide. In Figure 6 we 
show the fragments of Algol-60 syntax relevant to visibility. 
Irrelevant parts of the syntax have been elided. Each string 
generated by this grammar (ignoring reordering of declarations, 
etc.) defines a unique name structure, i.e., structural arrange- 
ment of visibility relations. In Figure 7 we have formulated a 
semantics oriented grammar for these relations. 
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<ic3entifier> :: = .... 

<block> ::= Cblock head>; Ccompound tail> 

Cblock head> ::= begin <dec la r a t i on> I Cblock head>; <declaration 
Ccompound tail> ::= <statement> end 

I <statenent>; <compound ta i 1 > 

<program> ::= <block> | Ccompound statement> 

<procedure declaration> : := [ctype>] procedure 

<proc . headi ng> <proc.body> 

<proc . head i ng > <proc. identified Cformal par.part>; 

<formal par.part> ::= ( <identifier> ,... ) 

<decla rat ion> ::= <proc.decl.> I <other decl.> 

Figure 6 . A Fragment of Algol-50 

program: executable 

block: scope (dec la ra tion + , executable) 

declaration: simple-decl | proc-decl 

proc.decl: identifier X scope (simple-decl*, executable) 

simple-decl: identifier 

executable: {identifier | block}* 

Figure 7. The Algol-60 Name System 

Notice that, from the visibility standpoint, a procedure declara- 
tion is the same as a block; they both bind local identifiers and 
delimit a scope. Figure 8 shows the Algol-60 name system in 
diagrammatic form. The following figures (9-11) show the name 
systems of the lambda ca Iculus, FORTRAN and Pascal. 
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Figure 9* 



The Lambda-Calculus Name System 
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Figure 10. The Pascal Name System 
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Figure 1 1 . 



The FORTRAN Name 



System 



In the latter case (Pascal), note that we have analyzed the 
record declaration as a scope defining (or name grouping) con- 
structor. Figure 12 compares the complexities (as measured by 
edge-count) of these name systems along with the complexities of 
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their type systems. 





NAMES 



TYPES 



Figure 12. Complexities of Name and Type Systems 



6 . Control Structures 

Control structures are analyzed in the same way as the other 
structures. These are reflected in the equations and structure 
diagrams shown in Figures 13-16. 
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Figure 13* Pascal Control Structures 
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Figure 15 • FORTRAN Control Structures 





Figure 16. BASIC Control Structures 



Consider Pascal; the relevant parts of the grammar are showr 
in Figure 17. These diagrams are somewhat deceptive because thev 
do not reflect the ext raor d i na ry complexity introduced into the 
control structures by the goto statement. An analogous complex- 
ity is caused in data structures by the pointer construct. These 
are both examples of non- local references , whose proper treatment 
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simple-statement: assign-stat | proc-stat I goto-stat | empty 

assign-stat: expr 

f unction-desig : call (fid, exprlist) 

exprlist: expr* 

expr: f unction-desig* 

proc-stat: call (fid, {expr I fid}*) 

goto-stat: goto (label) 

statement: [label] x unlab-stat 

unlab-stat: simple-statement I struc-stat 

struc-stat: comp-stat I cond-stat I rep-stat ! with-stat 

comp-stat: statement" 1 " 

cond-stat: if-stat I case-stat 

if-stat: if (expr, stat, [stat] ) 

case-stat: case (expr, case-1 i st-e lenent" 1 ") 

case-list-element: const" 1 ” x statement 

rep-stat: while-stat | repeat-stat I for-stat 

while-stat: while (expr, stat) 

rep-stat: rep (stat + , expr) 

for-stat: for (id, forlist, stat) 

forlist: expr x [down] x expr 

with-stat: with (expr + , stat) 

Figure 17. Pascal Control Structure Grammmar. 



remains an open question. 
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7 . Conclusions 

The techniques we have described provide a simple, visual 
method of comparing the structuring methods provided by program- 

I 1 

ming languages. Languages can often be ranked as to their struc- 
tural complexity by comparing the complexity of their structural 
grammars or structure diagrams. In addition, the diagrams allov 
the language designer to appraise the regularity or irregularity 
of a structural subsystem and to identify areas where they can b« 
simpl i f i ed . 

Of course, it is very desirable to be able to quantify thes< 
ideas, and there are many approaches to this quant i f ica t ion . 0n< 
of the simplest, which was used in this paper, was to count th< 
number of edges in the graph, si nee this reflects the dependen- 
cies within the system. In the cases we have investigated, thi: 
metric agrees with our informal evaluation. 

These are, of course, other graph theoretic measures that 
can be applied, for instance, variants of McCabe's Cyclomatit 
Mumber [3], although which is the best remains an open question 
It is also possible to apply the measures of Halstead's "Softwar< 
Science" [1] to either the structural grammar or the structun 
diagrams. This has also been tried, but this work is still ii 
progress [ 2] . 

Although the proper measure to be applied remains an ope: 
problem, the representation of structures in a measurable formj 
such as the structure diagrams, is a first step toward: 
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development of these metrics. Future research will attempt to 
refine the analysis of structures and their representation as 
graphs, and will attempt to develop appropriate measures of their 
compl ex i ty . 
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